First steps in the logic-based assessment of 
post-composed phenotypic descriptions 



E. Jimenez-Ruiz^'^, B. Cuenca Grau^, R. Berlanga^, and Dietrich 
Rebholz- Schuhmann^ 

^ Universitat Jaume I de Castello, Spain {ejimenez,berlanga}@uji.es 
^ University of Oxford, UK {berg}@comlab. ox. ac.uk 
^ European Bioinformatics Institute, Cambridge, UK {jimenez,rebholz}@ebi. ac.uk 



Abstract. In this paper we present a prehminary logic-based evaluation 
of the integration of post-composed phenotypic descriptions with domain 
ontologies. The evaluation has been performed using a description logic 
reasoner together with scalable techniques: ontology modularization and 
approximations of the logical difference between ontologies. 

1 Introduction 

A phenotype is defined as a basic observable characteristic of an organism. Thus, 
a set of phenotypic descriptions may involve different domains and granularities 
ranging from molecular to organism level. 

Phenotypic descriptions have been recently described by means of termi- 
nological resources, with the Human Phenotype Ontology (HPO) [1] being a 
prominent example. The HPO ontology represents a so-called pre-composed de- 
scription: it does not provide explicit links between the phenotypic descriptions 
(e.g. increased calcium concentration in blood) and the relevant entities asso- 
ciated to it, such as the chemical element involved ("calcium"), the way in 
which it is involved ( "increased concentration" ) and where it appears ( "blood" ) . 
Post-composed phenotypic descriptions intend to provide a more formal repre- 
sentation to interoperate with involved entities [2] and to allow more powerful 
reasoning. Nevertheless, the formal representation of phenotypic descriptions is 
still a challenge [3, 4] owing to the complex nature of some phenotypes and the 
lack of consensus among clinicians to describe them in a standard way. 

Mungall et al. [3] and Hoehndorf et al. [4] have recently proposed automatic 
and semi-automatic methods to transform pre-composed phenotypic descrip- 
tions into a description logic (DL) based post-composed representation linked 
to domain ontologies. The integration of domain ontologies with post-composed 
phenotypic descriptions presents new challenges since most of the involved on- 
tologies are developed independently and may perform a different conceptual- 
ization for the same entities. Therefore, this integration may not always lead to 
the expected and proper logical consequences [5, 6]. In this paper we present 
first steps towards the logic-based assessment of the integration of phenotypic 
descriptions with domain ontologies. 




Fig. 1. An excerpt from the post-composed phenotypic descriptions of HPOpc 

2 Method and preliminary results 

Our experiments have been based on a post-composed version (from now on 
HPOpc) of the HPO ontology^ applying the method from [3]. The HPO ontology 
only provides a classification of pre-composed phenotypic descriptions (e.g. see 
left hand side of Figure 1), whereas HPOpc also provides explicit links to relevant 
domain entities (see right hand side of Figure 1). HPOpc contains 11382 entities 
and uses external concepts from different domain ontologies, including PATO [7] 
(264 concepts), Cell Ontology (12 cone), GO (96 cone), FMA [8] (812 cone), 
CHEBI (33 cone), and other OBO foundry ontologies [9]. 

A DL reasoner may be used to reclassify HPO concepts, according to the 
knowledge of HPOpc and linked ontologies, and get new interesting knowledge. 
However, as stated in [3], reasoning with HPOpc and all linked ontologies is time 
consuming. To smooth this limitation, we have extracted a locality-based module 
[10] for each set of referenced external entities. For example, the module for FMA 
contains 2044 concepts, which is much easier to reason with than the whole FMA 
(around 80000 concepts). Thus, we have built HPOall^ by merging HPOpc with 
the corresponding modules from the referenced ontologies. The classification of 
HPOall using HermiT [12] takes around 45 seconds in a 2Gb laptop. 

New subsumption relationships between HPO concepts may represent both 
desired new knowledge and unintended consequences. In order to evaluate the 
new logical consequences hold in HPOall we have borrowed the notion of logical 
difference from [13]. The logical difference between two ontologies contains the 
set of consequences that are inferred in one of the ontologies but not in the 
other. Unfortunately, there is no algorithm for computing the logical difference 
in expressive DLs. Moreover, the number of inferences in the difference may 
be infinite. Thus, we have reused the approximations of the logical difference 
presented in previous work [5], where inferences are one of the following simple 
kinds of axiom: (i) AT B, (ii) A\Z ^B, (in) A □ 3R.B, (iv) A Q \/R.B, and 
v) S (A, B are atomic concepts, including T, ±, and i?, S atomic roles). 

^ Available from http://bioonto.de/obo2owl/hpo-in-owl.owl 

^ We have converted the OBO ontologies to OWL using the OWLDEF method [11], 
and we have normalized the involved concept and property URIs 
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Fig. 2. Explanation for new equivalence between concepts i7P_0000969 (Edema) and 
ifP_0007430 (Generalized edema). With concept IDs (left) and concept names (right). 
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Fig. 3. Explanation for new subsumption relationship between concepts G'O_0008544 
(Epidermis development) and FMA_67175 (Anatomical entity) 



The logical difference between HPOall and HPOpc, affecting only HPO con- 
cepts, contains 759 new subsumption relationships (inferences of type (i)). The 
integration leads indeed to a reclassification of HPO concepts. For example, 
HPOall infers the probably non-intended consequence Generalized edema = 
Edema which was not hold in HPOpc- As shown in the Protege-like explanation 
from Figure 2 the new knowledge from FMA leads to this new consequence. 

The logical difference also contains 80 new entailments that relate concepts 
from domain ontologies (i.e. new cross-references). For example, the GO concept 
Epidermis development is classified under the FMA concept Anatomical entity. 
This consequence is probably not intended and it is due to the definition of 
range axioms in FMA (see Figure 3) and the use of the property part-of in 
different scopes (in FMA relates anatomical entities, whereas in GO biological 
processes). Additionally, if a greater approximation of the logic difference is con- 
sidered (i.e. entailments of type (ii)-(v)) new consequences are also obtained (e.g. 
GO_0030308 □ 3negatively. regulates. GO. 0040007, where GO_0030308 stands 
for Negative regulation of cell growth and GO_0040007 stands for Growth. 



3 Conclusions and future work 

The benefits of integrating phenotypic descriptions with domain ontologies have 
already discussed in the literature [2, 3, 4]. However, the consequences of the 
integration should be evaluated by domain experts in order to detect potential 
unintended consequences. 



In this paper we have performed a prehminary evaluation^ in which state 
of the art techniques (e.g. ontology reasoning, ontology modularization, logical 
difference) have been reused to extract the set of new consequences when inte- 
grating post-composed phenotypic descriptions, such as the provided by HPOpc, 
with domain ontologies. In a near future, we intend to develop a system to guide 
the expert in the detection and repair of unintended consequences such as in our 
previous tool ContentMap [5], in which we assessed the integration of ontologies 
through mappings. 

Moreover, domain ontologies contains cross-references (i.e. mappings) which 
have not been considered for this preliminary assessment. These new correspon- 
dences will probably lead to new consequences that should be assessed. Thus, 
we also intend to adapt the techniques proposed in [6] to this new setting. 
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HPOall and related domain ontology modules are available at: 
ht t p : / / kr ono . act . uj i . es/p eople /Ernesto / phenoty p eassessment / 



